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ABSTRACT 

Predicting ambulance demand accurately at fine time and lo¬ 
cation scales is critical for ambnlance fleet management and 
dynamic deployment. Large-scale datasets in this setting 
typically exhibit complex spatio-temporal dynamics and spar¬ 
sity at high resolutions. We propose a predictive method 
using spatio-temporal kernel density estimation (stKDE) to 
address these challenges, and provide spatial density predic¬ 
tions for ambulance demand in Toronto, Canada as it varies 
over honrly intervals. Specifically, we weight the spatial ker¬ 
nel of each historical observation by its informativeness to 
the current predictive task. We construct spatio-temporal 
weight functions to incorporate various temporal and spatial 
patterns in ambulance demand, including location-specific 
seasonalities and short-term serial dependence. This allows 
us to draw out the most helpful historical data, and exploit 
spatio-temporal patterns in the data for accurate and fast 
predictions. We further provide efficient estimation and cus¬ 
tomizable prediction procedures. stKDE is easy to use and 
interpret by non-specialized personnel from the emergency 
medical service industry. It also has significantly higher sta¬ 
tistical accuracy than the current industry practice, with a 
comparable amount of computational expense. 

Categories and Subject Descriptors 

H. 2.8 [Database Management]: Database Applications— 
Data mining; G.3 [Probability and Statistics]: Nonpara- 
metric statistics 

Keywords 

kernel density estimation; non-homogeneous Poisson point 
process; emergency medical service 

I. INTRODUCTION 

The emergency medical service (EMS) industry aims to 
minimize response times to emergencies and keep opera¬ 
tional costs low. Accurate spatio-temporal demand predic- 
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tions are critical to staff / fleet management and dynamic 
deployment. Predictions are required at high temporal and 
spatial resolutions for operational decision-making; the in¬ 
dustry typically predicts for every hour and every 1 km^ 
region. We are motivated to predict spatio-temporal ambu¬ 
lance demand in Toronto, Canada. 

There are several typical challenges to predicting ambu¬ 
lance demand. First, ambulance demand data is often sparse 
at the prediction resolution. For instance, Toronto receives 
only 23 highest priority events per hour on average; 96% 
of the 1 km^ spatial regions in Toronto have zero event in 
any hour. Also, ambulance demand frequently exhibits com¬ 
plex temporal dynamics, some of which are location-specific. 
For example, ambulance demand in Toronto has weekly and 
daily seasonality and short-term serial dependence of a few 
hours |12| |25| . Among these patterns, daily seasonality and 
short-term dependence are more pronounced downtown and 
in denser neighborhoods [^. Lastly, ambulance demand 
data in large cities is usually large-scale; Toronto dispatches 
for about 200,000 priority events per year. This may present 
computational challenges, especially since predictions are 
needed hourly. 

The current industry practice for predicting ambulance 
demand often uses a simple averaging formula. Demand 
in a 1 km^ spatial region over an hour is typically pre¬ 
dicted by averaging a small number of historical counts, 
from the same spatial region and over the corresponding 
hours from previous weeks or years [^. In current practice, 
Toronto EMS averages four historical counts in the same 
hour of the year over the past four years, while the EMS of 
Charlotte-Mecklenburg, North Carolina averages twenty his¬ 
torical counts in the same hour of the preceding four weeks 
for the past five years (the MEDIC method) . Averaging 
so few historical counts, which are mostly zeros, produces 
highly noisy and flickering predictions, resulting in haphaz¬ 
ard and inefficient deployment. Such methods are also sen¬ 
sitive to how the spatial domains are partitioned [^. 

Many studies have accurately predicted aggregate ambu¬ 
lance demand as a temporal process. These studies have 
considered autoregressive moving average models [^, factor 
models and spectrum analysis [^. However, very few 
studies have modeled spatio-temporal ambulance demand 
well. One study uses artificial neural networks (ANN) to 
predict ambulance demand on discretized time and space, 
but fails to improve predictive accuracy over typical indus¬ 
try practice due to data sparsity 20 . Another more re¬ 


cent study predicts ambulance demand in discrete time and 
continuous space 1^. It proposes a time-varying Caussian 



mixture model (GMM) to incorporate location-specific tem¬ 
poral patterns in the demand, giving higher predictive ac¬ 
curacy than industry approaches. However, its estimation 
procedure via Bayesian sampling may present computational 
challenges and require special expertise, making it difficult 
to use by non-specialized personnel from the EMS industry. 

Kernel density estimation (KDE) is a powerful tool for 
non-parametric density estimation in spatio-temporal data. 
It has been widely applied to visualize or forecast spatio- 
temporal crime incidence [^, disease spread [M 23 , 
product demand [TT], and data streams [T| [Tb]. In most 


cases, the time dimension is treated differently from the 
space dimension(s). The most traditional approach is to 
build a separate spatial KDE for each discrete time period. 
However, this approach may result in uneven subset size and 
sparse subsets with too little data for accurate density es¬ 
timates. Recent studies assume a multiplicative orthogonal 
relationship between the time and space dimensions. For ex¬ 
ample, [H multiplies a spatial kernel with a linear function 
in time. Studies such as |14[ |24| multiply a spatial kernel 
and a temporal kernel with different bandwidths and kernel 
functions for the two kernels. 

Extending these multiplicative spatio-temporal KDE meth¬ 
ods, we propose a fast and accurate method for predicting 
spatio-temporal ambulance demand that is practical and 
scalable in industrial settings. We follow in predict¬ 
ing in discrete time and continuous space. We propose a 
novel specification of spatio-temporal kernel density estima¬ 
tion (stKDE). First, we learn parametrically the temporal 
and spatial characteristics of the demand. Each historical 
observation is annotated with a weight based on what we 
have learned. This spatio-temporal weight function scales 
how helpful different historical observations are to a given 
predictive task. Then we construct a spatial kernel density 
estimator weighted by the informativeness weight function, 
and use the resulting kernel density estimates as predictions. 
In this way, we efficiently emphasize the historical data most 
important to prediction and, as far as possible, exploit the 
spatial and temporal characteristics in the data. This ap¬ 
proach has three main advantages 


(a) accessibility: stKDE is fully automated and robust. It 
is easy to interpret and use by non-specialized person¬ 
nel, while approaches such as ANN or GMM may need 
special expertise (e.g., tuning, MCMC diagnostics). 

(b) efficiency: stKDE has low computational complexity. 
It is faster than GMM; inferring latent component la¬ 
bel in GMM is costly. It can afford more frequent 
parameter estimation updates and online predictions. 

(c) accuracy: stKDE gives more accurate predictions than 
current industry practice with similar computational 
expense. It also outperforms naive KDE methods (and 
ANN via [20| ); it is at least as accurate as GMM. 

We implement this method on ambulance demand data 
from Toronto, Canada from years 2007 and 2008. The data 
consist of 391,296 priority emergency events received by 
Toronto EMS for which an ambulance was dispatched. Each 
record contains the time and the location to which the am¬ 
bulance was dispatched. This includes some calls not requir¬ 
ing lights-and-sirens response, but does not include sched¬ 
uled patient transfers. We include only the first event in 
our analysis when multiple responses are received for the 


same event; explanatory analysis did not reveal any spatial 
or temporal pattern for these cases, and we treat them as a 
single ambulance dispatch. We have removed all calls with 
no times or locations. There was no call received for more 
than two hours on March 10, 2007 due to a recording sys¬ 
tem malfunction, and we have also removed all calls from 
that day. These removals totaled less than 2% of the data. 
For example. Figure 1 shows the locations of all ambulance 
demand from January to July 2008. 

Our proposed method, stKDE, gives significantly more ac¬ 
curate predictions than current industry practice with simi¬ 
lar computational expense. It also compares favorably to the 
state-of-the-art in ambulance demand prediction research. 

We propose the stKDE model in §2 and discuss compu¬ 
tational methods in §3. We show the empirical results on 
Toronto ambulance demand in Section §4, and conclude in 
§5. 


2. METHODOLOGY 

We model Toronto’s ambulance demand on a continuous 
spatial domain 5 C and a discretized temporal domain 
of one-hour intervals T = {1, 2,...}. Let St,i be the location 
of the i-th ambulance demand arising from the t-th time 
period, for i € {l,...,nt}, where nt is the total number 
of ambulances demanded in the t-th period. Since a non- 
homogeneous Poisson process (NHPP) is a natural model for 
a spatial point process [^[^, we assume {st,i : i = 1,... nt} 
for each time period t independently follow an NHPP over S, 
with positive intensity function At. We decompose intensity 
function as At(s) = Stft{s), for s £ S. Here, St = At(s) ds 
is the aggregate demand intensity over the spatial domain, 
and /t(-) is the continuous spatial density of the demand 
at time t such that /t(s) > 0 and ft{s)ds = 1. Hence, 

for each t, nt\Xt ~ Poisson(Jt) and st,i|At,nt /t(-) for 
i G {1,..., nt}. As mentioned before, numerous studies have 
proposed sophisticated and accurate methods for estimating 
{Jt}. We thus focus on predicting the spatio-temporal de¬ 
mand density {ft}, which has received far less attention in 
the literature. With the predicted {ft}, we can furthermore 
predict spatio-temporal demand volume by multiplying our 
{ft} with {(5t} from studies such as [12|. 


2.1 Spatio-Temporal Kernel Density Estima¬ 
tion 

Suppose we observe and utilize historical ambulance de¬ 
mand from a set of past time periods Tots, and we would like 
to forecast demand for a future time period u. We propose 
to predict /„ using a spatio-temporal weighted kernel den¬ 
sity estimator. We place a bivariate spatial kernel K at the 
location of each past observation in Tabs, and weight each 
kernel by the informativeness of the corresponding observa¬ 
tion in predicting for the wth time period. Specifically, we 
have for s G 5 


Ms) 


J2teTat^ u) Kh{s - st,i) 


( 1 ) 


Here, w{st,i,u) is the informativeness weight function multi¬ 
plied by the spatial kernel of the past observation st,i. This 
weight function is defined in detail in §2.2. K is the chosen 
bivariate spatial kernel with bivariate bandwidth H, i.e., 

Kh{s - st,i) = - st,i)) . 





2.2 Weight Function 

The weight function w aims to capture the utility of a 
past observation in predicting demand at a future period. 
Specifically, we would like to incorporate in w the spatial and 
temporal dependencies in the demand. Domain knowledge 
on EMS demand densities focuses our attention on weekly 
and daily seasonalities and short-term serial dependence of 
a few hours, which have varying strengths in different neigh¬ 
borhoods. 

We can therefore discretize the spatial domain into C large 
spatial cells, representing a rough division into neighbor¬ 
hoods. We assume that temporal dependencies within each 
cell remain constant in space. Let Wc denote the weight func¬ 
tion local to each discretized cell c£ {1,...,C}. Within this 
cell, we further assume that the informativeness of a past 
observation from time t in predicting for future time u only 
depends on how far back t is from u. We use the weight 
function to measure how positively correlated two demand 
densities {u — t) periods apart are in each cell. We model 
the weight function as 


/ u-t , u-t It,; \ To } /o\ 

Wc{u-t) = +P2,cP3,c P4,c . (2) 


P4,c 

for c € {1,..., C}. We combine all Wc to have 
c 


u) = J2 Mu - t) 1 {St,.6 cell c}- 


(3) 


Here, {pi,c}, {P 2 ,c}, {ps.c} and {p 4 ,c} for c £ {1,..., C} are 
all constrained to take values in [0,1]. We use a separate 
p parameter to capture each typical EMS patterns for easy 
interpretation and comparisons across locations (e.g., down¬ 
town vs suburbs) and times (e.g., winter vs summer). The 
term describes any potential short-term serial depen¬ 
dence. Its parametric form is the same as a stationary first- 
order autoregressive model, AR(1), and is also equivalent 
to the squared exponential function often used in Gaus- 
The term with p 3 ,c describes any po- 


17 


Sian processes 

tential daily seasonality with Ti = 24, whereas the term 
with p 4 ,c describes any potential weekly seasonality with 
r 2 = 24 X 7 = 168. The parametric form of these two 
terms corresponds to the periodic function used in Gaus- 
These two seasonality terms are multi- 
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Sian processes 

plied, and further discounted by a squared exponential func¬ 
tion, Finally, we sum the short-term dependency ef¬ 

fect and the seasonality effects. The different p terms are 
combined in similar to the typical approach to combining 
covariance functions in Gaussian processes. . There may be 
other weight functions that work similarly; we draw inspi¬ 
rations from Gaussian processes because these functions are 
well-studied and have some nice properties (e.g., infinite dif¬ 
ferentiability). This parametrization of the weight function 
is easy to interpret and visualize, and flexible to experiment 
with, even for non-experts. 

The weight function is bounded between 0 and 2. We 
avoid negative weights to avoid negative kernels in the kernel 
density estimator, which complicates bandwidth selection, 
results in negative density estimates that need to be floored 
at zero, and produces discontinuities in the derivatives of the 
estimates [^. The magnitudes of the weights are nominal, 
as long as they are comparable across all C regions, since 
they are normalized in Equation < 0 - 

Equations g, g and g together form the model. 


2.3 Computational Methods 

We must select or estimate the kernel function K and 
bivariate bandwidth H in Equation 0. as well as the spa¬ 
tial discretization C and 4(7 number of p parameters in the 
weight function 0. Since the nature of ambulance demand 
does not change drastically over time, these estimations may 
be performed infrequently in practice (at most several times 
a year). 

For K, we can use the typical Gaussian kernel, or for 
additional computational savings, the Epanechnikov kernel 
with bounded support. We can select the bandwidth H via 
the plug-in method or smoothed cross-validation [^. We 
can also adopt one of many fast computational methods for 
KDE, including kd-trees [^, ball trees [^, dual trees [w] 
and statistical regular pavings [18| . 

For the weight function 0, we can choose the discretiza¬ 
tion mesh OT C a priori or via cross validation. A larger 
value of C allows personalized temporal patterns on a finer 
grid, but if C is too large, data may become too sparse for 
accurate estimation of temporal dependencies. In our appli¬ 
cation, C is best chosen to be close to 20, yielding discrete 
regions that are roughly 5 km by 5 km each. The 4(7 num¬ 
ber of p parameters in the weight function could be chosen 
in a number of standard ways; for instance we could use 
stochastic gradient ascent to maximize the joint likelihood 
of training data. For accurate estimation, we would need 
to use training data with tens of thousands of observations, 
and incur non-trivial computational cost. Here we introduce 
a much faster alternative method to estimate these param¬ 
eters. 

In Equation 0. Wc measures how positively correlated 
two demand densities (u — t) periods apart are at cell c. We 
can directly estimate this correlation as follows. For each 
cell c, we can approximate its demand density for any pe¬ 
riod by the proportion of observations arising from this cell 
out of all observations from that period. We can then ob¬ 
tain a time series of proportions and compute its (discrete) 
autocorrelation function Ac{£) for lag £ £ {1,..., L}, where 
L is the maximum lag considered. Typically L can be taken 
to be around several weeks (hundreds of one-hour periods). 
The non-negative part of this autocorrelation, Ac{£), or a 
smoothed version of it, is precisely what Wc aims to capture. 
For example. Figure 2 (a) shows an example of the autocor¬ 
relation function Ac{£), and the grey lines in Figure 2 (b) 
shows the corresponding A^ (£), for £ £ {!,..., 672} (up to 
4 weeks of one-hour periods). 

The goal is to find appropriate p parameters such that 
Wc best fits the shape of Ac ■ To do this, we would like 
to minimize the sum of squared errors between po,cWc{£) 
and A^ (£) at all time lags £ from 1 to L. We can find 
the optimal po,c to pa,c for this minimization using gradient 
descent or grid search. The extra parameter is needed 
to scale Wc to curve-match the magnitude of A^, and is of 
no real significance. Of greater importance is curvature or 
shape of Aj, which is captured in pi,c to p4,c. To make Wc 
comparable across all (7 cells, we need to normalize Wc such 
that the area under Wc up to L is the same across different 
cells. 






In summary, we estimate the p parameters in C minimiza¬ 
tion problems: for each c € {1,..., C}, 


L 

wTm 4r ^ “ Po.cW,{£)f 

L 

S.t. Wc(£) = 1- 

r=i 


(4) 


This computation is much more efficient than the joint es¬ 
timation of 4C parameters by maximizing likelihood. Here, 
we do not need to involve the kernel density estimator, nor 
loop through tens of thousands of ambulance demand obser¬ 
vations. We can easily compute the C minimization prob¬ 
lems in parallel. For each cell, we have a low-dimensional (5 
parameters) problem with a small number of observations 
L (around hundreds of hours of time lags). A wide array of 
standard algorithms for solving optimization problems can 
be applied. For example, we can Lagrangian relax the con¬ 
straint into the objective and use the genetic algorithm or 
particle swarm. 

Once the infrequently performed parameter estimation is 
done, predictions for any future time period can be calcu¬ 
lated instantaneously using short sliding windows of length 
L. We can additionally refine or customize the prediction 
procedure in the following two ways. 

First, to boost predictive accuracy, we can bilinearly in¬ 
terpolate the weight values smoothly over the spatial do¬ 
main instead of taking only C sets of values on a discretized 
grid. This is appropriate since we believe that the temporal 
patterns vary smoothly across the spatial domain. It also 
mitigates the sensitivity to predictions induced by choices 
of C. 

Secondly, we can impose an omission threshold value, O, 
for the weights. If the weight of a past observation st,i is 
below this threshold, i.e., if w{st,i,u) < O, we can omit this 
observation in the calculation of weighted KDE by overrid¬ 
ing w{st,i,u) — 0. The threshold can be chosen to balance 
the tradeoff between computational expense and predictive 
accuracy. 


3. PREDICTING TORONTO AMBULANCE 
DEMAND 

The computation has two stages. In the first stage, we 
estimate or choose all parameters, including the kernel K, 
bandwidth H, discretization C and 4C number of p pa¬ 
rameters. This estimation only needs to be performed in¬ 
frequently. For this parameter estimation, we use Toronto 
ambulance data from January to July 2008. Figure 1 shows 
the spatial locations of all observations from this 7-month 
period. In the second stage, we predict future ambulance 
demand on a sliding window of length L = 672 (4 weeks, 
around 15, 000 observations) for each one-hour period from 
August to December 2008. 

In estimation, we choose the Gaussian kernel for K, select 
the bandwidth H via the plug-in method and discretize 
Toronto into C = 21 equally-sized regions. We estimate 
the p parameters in the weight function using the method 
detailed in §3. As an example, we outline the cell c cover¬ 
ing downtown Toronto in Figure 1. We show in Figure 2 
(a) the autocorrelation function Ac for the proportions of 
observations arising from this downtown cell out of all ob¬ 
servations across hourly time periods. This autocorrelation 


function indicates weekly, daily seasonalities and low-order 
serial dependence. Figure 2 (b) shows in grey A'^ and in 
black the fitted weight function po,c'Wc for downtown, with 
pi,c = 0.95 (short-term serial dependence), p 3 ,c = 0.001 
(daily seasonality), p 4 ,c = 0.145 (weekly seasonality) and 
P 2 ,c = 0.9995 (discounting for seasonalities). The fitted 
weight function provides an interpretable basis for under¬ 
standing exactly which historical observations are the most 
important for prediction. For example, from Figure 2 (b), an 
EMS manager can recognize that at downtown, ambulance 
demand in the past day or two and corresponding hour of the 
past few weeks are the most important. Cross-correlations 
among the 21 weight functions estimated at different regions 
show that neighboring weight functions have some associa¬ 
tion, but those far apart are not correlated. 
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Figure 1: Spatial locations of all Toronto ambulance 
demand data from January to July 2008. To evalu¬ 
ate location-specific weight fnnctions, we discretize 
the spatial domain into 21 cells, and here we ontline 
the cell containing downtown Toronto. 

Once parameter estimation is done, we predict forward 
using a sliding window of 4 weeks for each one-hour pe¬ 
riod from August to December 2008. Figure 3 shows the 
predictive densities on August 6, 2008 (Wednesday) at two 
different time periods. The ambulance demand is, not sur¬ 
prisingly, concentrated at the heart of downtown during day 
time on Wednesday (Figure 3 (b)), and more spread out 
throughout the city during night time on Wednesday (Fig¬ 
ure 3 (a)). This illustrates that stKDE can differentiate 
temporal patterns at different time periods and locations. 

We compare stKDE to the following competing methods 

(a) The MEDIC method, which is an industry practice 
implemented in Charlotte-Mecklenburg, NC (§1). We 
implement this method as far as we have data. The 
cell count in a 1-km^ region and a 1-hour period is 
predicted by the average of corresponding cell counts 
in the preceding 4 weeks in the past two years. The 
log predictive density produced by MEDIC for August 
6, 2008 (Wednesday) at 2 - 3 am is shown in Figure 
4. Compared to Figure 3 (a), the MEDIC prediction 
appears much noisier. 

(b) Two naive KDEs, (i) using data from the most recent 
hour to predict the next hour, and (ii) using all data 
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Figure 2: (a) The autocorrelation function of the 

proportions of observations arising from the rect¬ 
angle in Figure 1 over all observations across one- 
hour periods, (b) The fitted weight function (black) 
against the nonnegative part of the autocorrelation 
function (gray). 


from the past four weeks with equal weights (this pro¬ 
duces a spatial only model, with almost no temporal 
variation). 

(c) A time-varying Gaussian mixture model. We quote 
results from implemented on Toronto data with 
different training / testing months and various model¬ 
ing specifications (e.g., number of components). The 
computational expense is considerable. 
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To compare the statistical predictive accuracies of our 
model and the comparison methods, we use the metric of 
average log score [^. This performance measure is widely 
used because it is a strictly proper scoring rule closely re¬ 
lated to Bayes factor and Bayes information criterion [^. It 
is the average log likelihood of test data, and is defined as 

Accuracy({s„,i}) = =- ’V] log/u(s„,i), 

^“GTtest ueTtest i = l 

in which Ttest are the test time periods, {su,i} are the test 
data, and fu{-) is the predictive density for the w-th period 
obtained either by various methods. Intuitively, it measures 
the probability that we observe test data given a model. 
Thus a less negative (higher) average log score indicates that 
the model is better at capturing the test data. 

We show in Table 1 the predictive accuracies produced 
by stKDE and the comparison methods. We present three 
variations of stKDE prediction: (i) using the estimated dis¬ 
cretized weight functions Wc as they are, (ii) spatially in¬ 
terpolating the estimated weight values, and (iii) imposing 
an omission threshold on the estimated weight values such 
that each prediction uses a comparable amount of data as 
the industry method (about 200 observations). 

The stKDE method significantly outperforms the MEDIC 
method (industry practice). It also outperforms the naive 
KDE methods, demonstrating the utility of incorporating 
spatio-temporal patterns via the weight functions. Our per¬ 
formance is comparable to time-varying GMM as it is im- 


Figure 3: Log predictive density using stKDE for 
Aug 6, 2008 (Wednesday) at (a) 2-3 am (demand 
more spread out at night) and (b) 2-3 pm (demand 
concentrated at downtown during the day). 
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Figure 4: Log predictive density using industry 
method for Aug 6, 2008 (Wednesday) at 2 - 3 am. 
Figure 3 (a) shows a less noisy density for the same 
period predicted by stKDE. 























Prediction method Accuracy 


StKDE 

+ interpolation 
+ threshold (less data) 

-6.106 

-6.102 

-6.635 

MEDIG 


-8.642 

naiveKDE 

most recent hour 

-6.921 


all equal weights 

-6.254 

GMM 


-6.072 to -6.149 


Table 1: Predictive accuracies of stKDE and com¬ 
peting methods. Results of GMM are quoted from 
[25| implemented on Toronto data with various 
training / testing months and model specifications. 


plemented on Toronto data with different training / testing 
months and modeling specifications. Among the three varia¬ 
tions of stKDE, allowing for bilinear interpolation of weight 
values improves the predictive accuracy slightly. In the third 
variation, including the omission threshold leads to a small 
loss of accuracy but reduces computational cost significantly 
to be comparable to the industry method. 

The infrequent estimation of weight functions and band¬ 
width takes several hours on a personal computer. This 
offline training is significantly shorter than that of GMM 
(inferring latent component labels in GMM is costly). It 
does not take much longer than estimating bandwidths for 
naive KDE methods. Once estimation is done, making each 
new prediction is instantaneous (a few seconds). We could 
further reduce the computational expense of stKDE by par¬ 
allelizing weight estimation, using a tree-based algorithm for 
fast KDE computation, using a bounded kernel function, 
or creating a look-up table of densities (none of these was 
done). 

4. CONCLUSIONS 

Fine-resolution spatio-temporal ambulance demand pre¬ 
dictions are crucial to optimal ambulance planning. The 
EMS industry practice and early studies either sacrifice pre¬ 
dictive accuracy for fast computation, or incur substantial 
computational cost in pursuit of high accuracy. We pro¬ 
vide a much-needed prediction method that is both accurate 
and fast. We predict spatio-temporal ambulance demand 
in Toronto with higher accuracy and comparable computa¬ 
tional cost as a typical industry practice. 

We propose a spatio-temporal weighted kernel density es¬ 
timator. The spatial kernel of each historical observation is 
multiplied with a weight value to indicate the informative¬ 
ness of this historical observation to the current predictive 
task. The spatio-temporal weight functions are inferred from 
dependencies in data, are unique to each neighborhood and 
can be updated regularly. This is an improvement from the 
ad hoc heuristic that only accounts for the weekly and yearly 
seasonality across the entire city. The weight functions are 
also flexible to represent various spatial and temporal char¬ 
acteristics. They are easy to experiment with, visualize and 
interpret by non-experts. Moreover, stKDE easily handles 
missing data by placing zero weight and scaling up weights 
on other data proportionally. It can also easily predict many 
hours or days into the future. 


The proposed method provides efhcient estimation of the 
weight function, and offers customizable prediction to bal¬ 
ance the tradeoffs between accuracy and computational cost. 
It is straightforward to implement by non-specialized users 
and scalable to large-scale datasets, and can be easily gen¬ 
eralized to a wide range of other applications with spatial- 
temporal point process data. 
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